RIP Technique for Frequent Itemset Mining
ثبت نشده
چکیده
Data mining is a rapidly expanding field being applied in many disciplines, ranging from remote sensing to geographical information systems, computer cartography, environmental assessment and planning. Rule mining is a powerful technique used to discover interesting associations between attributes contained in a database (Han et al., 2006). Association rules can have one or several output attributes. An output attribute from one rule can be used as the input of another rule. Association rules are thus useful, both for obtaining an idea of what concept structures exist in the data (as with unsupervised clustering) and for model creation. In the second instance, the generated rules provide the underlying concepts used in the construction of decision trees and even neural networks, although this is carried out by the automated learning process. Sequential Pattern Mining also comes in Association rule mining For a given transaction database D, an association rule is an expression of the form X →Y, where X and Y are subsets of attributed set A. The rule X →Y holds with confidence t, if t% of transactions in D that support X also support Y. The rule X →Y has support δ in the transaction set D if δ% of transactions in D support X ∪ Y. Association rule mining can be divided into two steps. In first step, frequent patterns with respect to support threshold (known as min sup) are mined. In second step, association rules are generated with respect to minimum confidence. Many variants of the ARM based algorithm have been developed. Proposed technique is termed as Relative Item Path (RIP) based ARM. It creates an innovative graphical structure that is dynamically updated for each transaction in order to determine associated frequent itemset. This technique scores over existing efficient techniques, which had been proposed in recently year. In this proposed technique, each transaction updates the existing graph created by previous transactions, modifying the RIP value associated with the link. The number of scans of database that are required for determining the association of frequent itemset is reduced, saving a great deal of time consumed in database access. The unique feature of the created RIP graph is that it contains nodes equal to total number of items only. This significantly reduces the processing time and memory space required for ARM. The technique works optimally for small and moderate size database. Large databases give rise to enhanced RIP, which are cumbersome in updating. Still, saving in number of access of database and efficient handling of generated RIP graph achieved by the proposed technique make it a strong candidate for determining ARM.
منابع مشابه
Ramp: High Performance Frequent Itemset Mining with Efficient Bit-Vector Projection Technique
Mining frequent itemset using bit-vector representation approach is very efficient for small dense datasets, but highly inefficient for sparse datasets due to lack of any efficient bit-vector projection technique. In this paper we present a novel efficient bit-vector projection technique, for sparse and dense datasets. We also present a new frequent itemset mining algorithm Ramp (Real Algorithm...
متن کاملRamp: Fast Frequent Itemset Mining with Efficient Bit-Vector Projection Technique
Mining frequent itemset using bit-vector representation approach is very efficient for dense type datasets, but highly inefficient for sparse datasets due to lack of any efficient bit-vector projection technique. In this paper we present a novel efficient bit-vector projection technique, for sparse and dense datasets. To check the efficiency of our bit-vector projection technique, we present a ...
متن کاملA Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining
Data Mining and knowledge discovery is one of the important areas. In this paper we are presenting a survey on various methods for frequent pattern mining. From the past decade, frequent pattern mining plays a very important role but it does not consider the weight factor or value of the items. The very first and basic technique to find the correlation of data is Association Rule Mining. In ARM...
متن کاملAccelerating Closed Frequent Itemset Mining by Elimination of Null Transactions
The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...
متن کاملA New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کامل